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Abstract. In a society of multiple individuals, if everybody is only in- 
terested in maximizing his own payoff, will there exist any equilibrium 
for the society? John Nash proved more than 50 years ago that an equi- 
librium always exists such that nobody would benefit from unilaterally 
changing his strategy. Nash Equilibrium is a central concept in game 
theory, which offers the mathematical foundation for social science and 
economy. However, the original definition is declarative without including 
a solution to find them. It has been found later that it is computation- 
ally difficult to find a Nash equilibrium. Furthermore, a Nash equilibrium 
may be unstable, sensitive to the smallest variation of payoff functions. 
Making the situation worse, a society with selfish individuals can have 
an enormous number of equilibria, making it extremely hard to find out 
the global optimal one. This paper offers a constructive generalization of 
Nash equilibrium to cover the case when the selfishness of individuals are 
reduced to lower levels in a controllable way. It shows that the society 
has one and only one equilibrium when the selfishness is reduced to a 
certain level. When every individual follows the iterative, soft-decision 
optimization process presented in this paper, the society converges to 
the unique equilibrium with an exponential rate under any initial con- 
ditions. When it is a consensus equilibrium at the same time, it must 
be the global optimum. The study of this paper suggests that, to build 
a good, stable society (including the financial market) for the benefit 
everyone in it, the pursuing of maximal payoff by each individual should 
be controlled at some level either by voluntary good citizenship or some 
proper regulations. 



1 Introduction 



John Nash has proved in 1950 using Kakutani fixed point theorem that any 
n-player normal-form game [l] has at least one equilibrium. In n-player normal- 
form game, each player has only a finite number of actions to take and takes one 
strategy at action playing. If a player takes one of the actions in a deterministic 
way, it is called a pure strategy. Otherwise, if a player takes anyone of the 
actions following some probability distribution defined on the actions, it is called 
a mixed strategy. At a Nash equilibrium, each player has chosen a strategy (pure 
or mixed) and no player can benefit by unilaterally changing his or her strategy 
while the other players keep theirs unchanged. 



Nash Equilibrium is arguably the most important concept in game theory, 
which has significant impacts on many other fields like social science, economy, 
and computer science. It is an elegant theory for understanding a very important 
scenario in game playing. 

However, the original definition is not constructive. It does not offer a so- 
lution to find them. Recent studies found that finding a Nash equilibrium is 
computationally hard (PPAD-complete) [213] even for 2-player games The 
state of the art of existing computer algorithms are Lemke-Howson [5 for 2- 
player games, Simplicial Subdividison [B] and Govindan- Wilson [7] for N-player 
games. 

A Nash equilibrium may not be stable. A mixed strategy equilibrium is al- 
ways very sensitive to perturbation and computing errors. A smallest change in 
utility function or a slightest round-off error could knock the players out their 
equilibrium with mixed strategies. Furthermore, a N-player game may have a 
huge number of Nash equilibria, growing exponentially with the number of play- 
ers. The players can be trapped into one equilibrium or another, sensitive to 
initial conditions and perturbations. Finding the optimal one turns out to be a 
NP-hard problem. 

Often times, the memory, information exchange, and computing power are 
imperfect and limited for real living beings in a society. We can imagine that it 
is not an easy task for them to reach a Nash equilibrium. The Nash equilibrium 
is defined by selfish individuals trying to maximizing their own payoffs. Our 
experiences tell us that a society with selfish individuals may not be able to yield 
good payoffs to everyone in it. Such a society could be unstable, quickly sways 
from one state to another, and never be being able to reach an equilibrium. Could 
we build a good, efficient, and stable society by simply reducing the selfishness 
of individuals in a society? 

Our conventional wisdom tells us that if each of us gives away a bit more 
in favor of others, we could end up with more gains as return. That is, the 
reduced selfishness leads to better payoffs for the individuals in a society. For 
instance, if we, as drivers, respect other drivers sharing the same road and give 
considerations for each other either voluntarily and/or by following traffic laws, 
then each of us will end up with a faster, safer drive to destination than the case 
when everyone is only interested in maximizing his own speed to destination. 

This paper offers a constructive generalization of Nash equilibrium along the 
line of reducing selfishness. It is based on a recently discovered general global 
optimization method, called cooperative optimization [8 9 10] . Cooperation is 
an ubiquitous phenomenon in nature. The cooperative optimization theory is a 
mathematical theory for understanding cooperative behaviors and translating it 
into optimization algorithms. 

2 A Constructive Generalization 

There is a fundamental difference between cooperative optimization and many 
classical optimization methods. It is at the very core of optimization, i.e., the way 
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of making decisions for assigning decision variables. Classic ones often times make 
precise decisions at assigning variables at a given time instance of optimization, 
such as x = 3 for the time instance t. Such an assignment is precise at the sense 
that x can only be the value of 3, not any other ones. In contrast, the former 
makes soft decisions, represented by probability like functions called assignment 
functions, such as &(x, t), at the time instance t. It says that at the time instance 
t, the variable x can be of any value with the likelihood measured by the function 
value \P(x,t). A variable value of a higher function value is more likely to be 
assigned as a value to the variable than another value of a lower function value. 

If the function \t{x, t) at time t is peaked at a specific value, say x = 3, then 
the soft decision falls back to the classic precise decision, e.g., assigning the value 
3 to the variable x (x = 3). Hence, soft decision making is a generalization of 
the classic precise decision making. 

Let E(x\,X2, ■ ■ ■ , x n ) (or simply E(x)) be a multivariate objective function of 
n variables. Assume that E{x) can be decomposed into n sub-objective functions 
Ei(x), one for each variable, such that those sub-objective functions satisfying 

Ei{x) + E 2 (x) + ... + E n {x) = E(x) , 

and/or the maximization of Ei(xi) with respect to Xi also leads to the maxi- 
mization of E{x) for any i. 

In terms of a multi-agent system, let us assign Ei(x) as the objective function 
for agent i, for i = 1,2, ... ,n. There are n agents in the system in total. The 
objective of the system is to minimize E(x) and the objective of each agent i is 
to minimize Ei(x). In game theory, Ei(x) is called the utility function of agent 
i. In this paper, E(x) is also called the global utility function of a game. 

A simple form of cooperative optimization is defined as an iterative update 
of the assignment function of each agent as follows: 

V i (x i ,t)=J2 [e Ei(x)/h llPj(xj,t-l) \ , fori = l,2,...,n, (1) 

where J2^ x . stands for the summation over all variables except Xi and ft is a 
constant of a small positive value. pi(xi,t) is defined as 

p i {x i ,t) = {* i {x i ,t)) a /Y,{*i{xut)Y , (2) 

Xi 

where a is a parameter of a positive real value. 

By the definition, p,(xj,£) just likes a probability function satisfying 

^2pi(xi,t) = 1 . 

Xi 

It is, therefore, called the assignment probability function. It defines the probability- 
like soft decision at assigning variable Xi at the time instance t. 
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The original assignment function iPi(xi, t), is called the assignment state func- 
tion. That is, the state of agent i at the time instance t is represented by its 
assignment state function Wi(xi,t). From Eq. [5] we can see that the assignment 
probability function pi(xi,t) is defined as the assignment state function &i(xi) 
to the power a with normalization. To show the relationship, the assignment 
probability function pi(xi,t) is also expressed as U?j(a;j,i)) in the following 
discussions with the bar standing for the normalization. 

With this notation, the iterative update function ([T]) can be rewritten as 

V i (x h t)=J2 [^ x)/n ll^j(x j ,t-l)) a \ , fori=l,2,...,n. (3) 

By substituting Eq.[T]into Eq.[2j we have a mapping from a set of assignment 
probability functions to itself. Because the set is compact and the mapping is 
continuous, so a fixed point exists based on Brouwer fixed point theorem. Since 
a set of assignment state functions is uniquely defined by a set of assignment 
probability functions by Eq. [TJ We can conclude that there exists at least one 
set of assignment state functions {^(xi), (X2), ■ ■ ■ , &*(x n )} such that 

*f (*0 = E ('' " iK'^'-O ;)" ] , for i = 1,2, . . .,n . 

Without loss of generality let the utility function Ui(x) for the agent i be 
defined as 

Ul {x) = e E ^' n . 

In this case, the agent i tries to maximize the utility function Ui(x) instead of 
maximizing the objective function Ei{x) where the former task is equivalent to 
the latter. Accordingly, the simple form of cooperative optimization §S§ becomes 

&i( Xi ,t)=J2 iu i {x)'Y[p j (x j ,t-i) , for i = l,2,...,n , (4) 

where Pj (xj , i) is the assignment probability function defined by Eq. [5J 

From Eq. [4] we can see that the assignment state function \Pi(xi,t) for a 
given variable value Xi = a is the payoff of agent i with the action a (taking only 
the action labeled by the value a) while other players use the mixed strategies 
Pj(xj,t) (for the js where j An action a\ is better than another action ai 
if tfi(ai,t) > tf^(a2,t). The expected payoff of the agent i is determined by the 
mixed strategy pi (xi , t) as follows 

y^^i(xi,t)pi(xi,t) . 

Xi 

The probability assignment function pi(xi,t) is also called the strategy of 
agent i in game theory. The set of strategies {pi(x%, t),p%{x2, t), . . . ,p n {x n ,t)} 
is called a strategy profile in game theory, denoted as p. 
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The best action of agent i at time t is defined as the one with the highest 
payoff, i.e., the Xi that maximizes H?i(xi,t). Assume that the total number of 
actions of agent i is m,. Assume further that a > 1. Based on its definition 
given in (U), we find out the difference between the best payoff max Xi ^(ccj,t) 
and the expected payoff Y2 X &i( x i,t)pi(xi,t). ft is straightforward to derive that 
the difference should satisfy the following inequality: 

< max^Xi^) <F l (x i ,t)pi(x i ,t) < (— maxfjfc, t) | a -1 . 

Xi *■ — ' \ e xi I 

Xi v ' 

Obviously, the difference can be arbitrarily small when the parameter a is suffi- 
ciently large. That is, the difference is reduced to zero when a — > oo, 

lim max^fx^i) - y^ ^x^^p^x^t)] = 0. . 

a — >oo I Xi *• — ' / 

Based on Brouwer fixed point theorem, the simple form (01 must also exist an 
equilibrium (a fixed point) for any a > 0. That is, given any a > 0, there exists 
at least one set of assignment state functions {!?*(£i), (212), . . . , &*(x n )} such 
that 

U,(^n(^(^)) Q ' fori=l,2,...,n. 

~Xi \ j=jLi J 

At the equilibrium, we know from the previous discussion that, for each 
agent i, the difference between its best payoff m&x Xi <&* (x^ and its expected 
payoff J2 X ^i{ x i)Pi( x i) can be arbitrarily small if we choose a sufficiently large 
parameter a. That is, for any i, 

lim ( max V* (x z ) - V V* (x t )p* (**)]=(). (5) 



a — >oo 



Given a strategy profile p* , it is a Nash equilibrium if and only if, given any 
agent, its best payoff is equal to its expected payoff J^x ^i( x i)Pi( x i)- That is, 
for any i, 

max 3? ( Xi ) - ^ ^ V* ( Xi )p* (as*) = . (6) 

Compare the statement ([5]) with the statement ©, we can conclude that any 
equilibrium of the simple form of cooperative optimization (|4]) can be arbitrarily 
close to a Nash equilibrium if the parameter a is sufficiently large. The simple 
form not only offers a general definition of a new kind of equilibria, but also 
provides an algorithmic method for finding them. 

A very large value for the parameter a stands for a very selfish agent. To make 
this point clear, we can take a look at Eq. [5] used for computing the strategy of 
each agent at the time instance t. With a very large value a, each agent greatly 
amplifies the probability for its best action(s) that has the best payoff at the 
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time instance t. At the same time, the probabilities of its sub-optimal action(s) 
that offer less payoffs than the best one are significantly suppressed near to the 
value zero. Equivalently, we can say that each agent is selfish because he is only 
interested in maximizing his own payoff. 

This observation explains why a Nash equilibrium may not be stable since 
it can be extremely sensitive to perturbations and errors introduced by the 
communications among agents, and variations in utility functions. For example, 
a slightest variation in the utility function could lead to a dramatic shift of the 
equilibrium from one point in the strategy profile space to another one. It will be 
hard for an algorithmic method to converge to an unstable equilibrium purely 
based on iterations. 

As a summary, we can say that pursuing the maximal payoff by every player 
in a game often lead to the difficulty for the game to reach an equilibrium. Even if 
an equilibrium is found, it could also be unstable, very sensitive to small changes 
in the utility functions. Furthermore, the final payoff for each player in the game 
may be good enough. Can the situation be improved if we simply reduce the 
selfishness of agents by tuning down the parameter a? 



3 Towards the Global Optimum 

It is desirable to define some kind of equilibria that are stable and easy to find. 
It would be ideal if there exists one and only one equilibrium for a game and 
the equilibrium is also the social optimum (the global optimum of the global 
utility function E(x)) at the same time. Often times, a social optimum of a 
society leads to better payoffs for individuals in the society. At least, it is the 
best on average for each individual, realizable through the wealth redistribution 
at certain degree through some social welfare system. It will be shown in this 
section that these are possible if the simple form of cooperative optimization JT]) 
is converged back to the original general form of cooperative optimization and 
the value of the parameter a is reduced below a certain threshold. 

From the iterative update function © defining the simple form, we can 
replace the constant a by A(t)tUy, where both X(t) and Wij are constant param- 
eters. With that substitution, the equation becomes as follows, 

^(*i,*)=E(^ w/ *n( # j(*i'*- i )) A(t) ^) > w 

Further note that a maximization operator can be approximated by a sum- 
mation operator as follows: 

maxe^/ ft « . 

X 

(Under the assumption that the function f[x) has a unique global maximum.) 
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Such an approximation becomes accurate when h — > + , i.e., 

hm [maxe^)/ fi - Y e^A = . 
With this approximation, the iterative update function ([7]) becomes 

<^,t) = max|^^ . (8) 

Taking the logarithm of the both sides, we have the following maximization 
problem, 

&i(xj, t) = max Ej{x) + X(t) Wij^j(xj,t - 1) ) , for i = 1, 2, . . . , n . (9) 

~ x< V / 

This is the original general form of cooperative optimization. 

In this form, each agent optimizes the compromised utility function defined at 
the right side of the above equation. It is called the compromised utility function 
in the sense that it is the linear combination of the original utility function Ei {x) 
for agent i and the assignment state functions \Pj(xj,t — 1) of other agents j at 
the previous time instance t — 1. The assignment state function \Pi(xi,t) stores 
the best payoffs in terms of the compromised utility function given different 
values for variable Xi. Therefore, it is also called the assignment payoff function 
in the general form. 

In summary, the general form of cooperative optimization defines a multi- 
agent system. In the system, every agent compromises its own utility function by 
taking into account the possible payoffs of other agents and all agents optimize 
their own compromised utility functions altogether at the same time in parallel. 
Therefore, such a multi-agent system is distributed and autonomous, making it 
highly scalable and less vulnerable than a centralized one to perturbations and 
disruptions on the agents in the system 

Given an assignment payoff function ^ (xi , t) of agent i at iteration time 
instance t, let Xi{t) be the value of Xi maximizing the function, i.e., 

Xi(t) = arg max ^ (xi , t) . (10) 

It represents the best value of Xi at iteration time instance t that gives the 
highest payoff. In other words, assigning Xi(t) to Xi leads to the maximization 
of the compromised utility function defined at the right side of ([9]). The solution 
of the system at iteration time instance t is the collection of those best values 
as follows 

(xi(t), X2(t), . . . ,x n (t)) or simply as x(t) . 

The parameters Wij (1 < i,j < n) in ([9]) control the propagation of assign- 
ment payoff functions Wj{xj,t) (j — 1, 2, . . . , n) among the agents in the system. 
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All of WijS together form anxn matrix called the propagation matrix W. To 
have E>i{x) as the global utility function to be maximized, it is required that 
the propagation matrix W = (wij)nxn is non-negative and satisfies 

n 

^ Wjj = 1, for j = 1,2, ...,n . 
»=i 

The propagation matrix W has exactly the same property as a transition ma- 
trix at describing a Markov chain. To have assignment payoff functions &j (xj , t) 
uniformly propagated among all the agents, it is required that the propagation 
matrix W is irreducible and aperiodic. A matrix W is called reducible if there 
exists a permutation matrix P such that PWP T has the block form 

A B 
O C 

Given a constant cooperation strength \(t) of a non-negative value less than 
1, i.e., X(t) = A and < A < 1 for every time instance f, the general form 
of cooperative optimization @ has one and only one equilibrium. It always 
converges to the unique equilibrium with an exponential rate regardless of initial 
conditions and perturbations. 

To be more general, assume that the agent i's utility function Ei(x) is defined 
on variable set Xi. Recall that the solution at iteration t is x(t) (see (|10l) h Let 
x(t)(Xi) denote the restriction of the solution on Xi. The solution x(t) is called 
a consensus solution if it is the optimal solution for each optimization problem 
defined by ©. That is, 

x(t)(X l ) = argmax I E^x) + A(t) w ij W'j(xj,t - 1) ) , for i = 1, 2, 

(11) 

It is important to note that if the general form of cooperative optimization 
discovers a consensus solution at any time instance t, then it must be a pure 
strategy Nash equilibrium. This conclusion is obvious from the definition of 
consensus solution given in Eq. 1111 where no agent i would get higher payoff 
from unilaterally changing its best assignment Xi (t) . Furthermore, if it converges 
to a consensus equilibrium with a constant A satisfying < A < 1, then it is 
both a pure strategy Nash equilibrium and the social optimum defined as the 
global optimum of the global utility function of the game, 

Ei(x) + E 2 (x) + ■ ■ ■ + E n (x) . 

When a game has an enormous number of Nash equilibria, it is important to 
find the global optimal one. 

4 Conclusions 

This paper presented a multi-agent system for a constructive generalization of 
Nash equilibrium. The dynamics of the system is defined by a general global 



8 



optimization method, called cooperative optimization. The selfishness of each 
agent is defined by a parameter used at computing the agent's strategy during 
each iteration. Given any positive value for the parameter, the system always 
exists an equilibrium. In particular, any equilibrium of the system can be arbi- 
trarily close a Nash equilibrium when the parameter controlling the selfishness 
is sufficiently large. In this case, each agent in the system is only interested in 
maximizing its own payoff. 

This constructive definition offers an insight into the computational difficulty 
at finding a Nash equilibrium. It also offers a perspective from a cooperation 
point of view at understanding the instability of a Nash equilibrium. This paper 
shows that when the selfishness of agents is controlled at some level, better 
and more stable equilibria could be reached by the system. Under some proper 
level, there is only one equilibrium for the system and it converges to it at an 
exponential rate with any initial conditions. When it is a consensus equilibrium 
at the same time, it must be the global optimum. 
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